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Abstract. This is a survey of several exciting recent results in which tech- 
niques originating in the area known as additive combinatorics have been ap- 
plied to give results in other areas, such as group theory, number theory and 
theoretical computer science. We begin with a discussion of the notion of an 
approximate group and also that of an approximate field, describing key re- 
sults of Freiman-Ruzsa, Bourgain-Katz-Tao, Helfgott and others in which the 
structure of such objects is elucidated. We then move on to the applications. 
In particular we will look at the work of Bourgain and Gamburd on expansion 
properties of Cayley graphs on SL/2(F P ) and at its application in the work of 
Bourgain, Gamburd and Sarnak on nonlinear sieving problems. 



1. Introduction 

The subject of additive combinatorics has grown enormously over the last ten 
years and now comprises a large collection of tools with many applications in num- 
ber theory and elsewhere, for example in group theory and theoretical computer 
science. It has often been thought a little difficult to specify to an outsider exactly 
what the subject is^ However the following point of view seems to be gradu- 
ally crystallising: additive combinatorics is the study of approximate mathematical 
structures such as approximate groups, rings, fields, polynomials and homomor- 
phisms. It is interested in what the right definitions of these approximate structures 
are, what can be said about them, and what applications this has to other parts of 
mathematics. 

This article has three main aims. Firstly, we wish to introduce the above point 
of view to a general audience, focussing in particular on the basic theory of approx- 
imate groups and approximate fields. Secondly, we wish to sketch some beautiful 
applications of these ideas. One of them has to do with the beautiful picture on the 
cover (for which we thank Cliff Reiter) of an Apollonian circle packing. It is classical 
that the radii of these circles are all reciprocals of integers. We will describe work 
of Bourgain, Gamburd and Sarnak giving upper bounds for the number of circles 
at "depth n" which have radius the reciprocal of a prime. Thirdly, we wish to hint 
at the extraordinary variety of different areas of mathematics which have started 
to interact with additive combinatorics: geometric group theory, analytic number 
theory, model theory and point-set topology are just the ones we shall mention 
here. 



2000 Mathematics Subject Classification. Primary . 

This article was written while the author was a fellow at the Radcliffc Institute at Harvard. It 
is a pleasure to thank the institute for its support and excellent working conditions. 
1 See, for example, my own attempt in the opening remarks of [26]. 



2 



BEN GREEN 



What we offer here is merely a taste of this viewpoint of additive combinatorics 
as the theory of approximate structure and of its applications. We do not touch 
on the theory of approximate polynomials (a.k.a. the theory of Gowers norms) 
or say much at all about approximate homomorphisms, or anything about the 
many applications of these two notions. These topics will be covered in detail in 
forthcoming lecture notes of the author [29] . 

2. Approximate groups 

Before we can define an approximate group, we need to recall what an exact one 
is. We shall be concerned with finite groups, and we shall be working inside some 
ambient group G, so that it makes sense to talk about multiplication of elements 
and taking inverses. If A C G is a finite set then we shall write A ■ A := {a±a2 ■ 
ai,a,2 € A} and A-A^ 1 := {aia^ 1 ■ a\,a2 £ A}. Later on we shall see more general 
notations such as A ■ A ■ A and A ■ B whose meaning, we hope, will be evident. The 
following proposition, whose proof is an exercise in undergraduate group theory, 
gives various criteria for A to be a subgroup or something very closely related. 

Proposition 2.1. Let A be a finite subset of some ambient group G. Then we have 
the following statement^ 

(1) \A • j4 _1 | ^ \A\, with equality if and only if A — Hx for some subgroup 
H G and some element x £ G; 

(2) \A- A\ \A\, with equality if and only if A — Hx for some subgroup H ^ G 
and some element x in the normaliser Nq(H); 

(3) The number of quadruples (01,02,03,04 ) £ A 4 with a\a 2 1 = 0304 1 is at 
most |A| 3 , with equality if and only if A — Hx for some subgroup H ^ G 
and some element x £ G; 

(4) The number of quadruples (ai, a 2 , 03, a 4 ) £ A 4 with dia 2 = 0304 is at most 
\A\ 3 , with equality if and only if A — Hx for some subgroup H G and 
some element x £ Nq(H); 

(5) P(aia2 £ A\a\,a2 € A) ^ 1, with equality if and only if A — H for some 
subgroup H ^ G; 

(6) P(aia 2 ~ 1 £ A\ai,a2 £ A) ^ 1, with equality if and only if A = H for some 
subgroup H ^ G. 

This would be a rather odd proposition to sec formulated in an algebra text. 
However each of the statements (1) - (6) has been constructed as an inequality in 
such a way that one may ask when equality approximately holds. Before we can 
talk about such approximate variants, however, we need to know how approximate 
they will be. For this purpose we introduce a parameter K ^ 1; larger values of K 
will indicate more approximate, and thus less structured, objects^] 



2 Statements (5) and (6) look "probabilistic" but this is just a notation. By P(aia2 S A\a\ , 0,2 6 
A) we mean simply the proportion of all pairs a\, 0,2 £ A for which a\a,2 also lies in A. 

3 In practice the theory when K 1 is very different from the theory when, for example, 
K ~ 100. In the former setting, these approximate notions of subgroup constitute very small 
perturbations of the exact characterisations of Proposition |2.1| and it turns out (though is not 
always trivial to prove) that the approximate objects so defined are small perturbations of the 
exact objects characterised by Proposition |2.1| In conversation Tao and I tend to refer to this 
regime as "the 99% world" , an expression I would not be averse to popularising. In this paper 
K will be much larger, causing the theory to become much richer. Tao and I call this the "1% 
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Consider, then, the following list of properties that a finite set A C G might 
enjoy. 

(1) \A-A-i\<K\A\; 

(2) \A-A\^K\A\; 

(3) The number of quadruples (01,02,03,04) G A 4 with oia 2 1 = 0304 1 is at 
least \A\ 3 /K ; 

(4) The number of quadruples (ai, 02, 03, 04) € A A with 0102 = 0304 is at least 
\A\ 3 /K; 

(5) P(a ia2 S A\a u a 2 G A) > 1/K; 

(6) P^aia^ 1 e A\ ai ,a 2 ei)) 1/K. 

Now these are by no means as closely equivalent as the properties (1) - (6) in 



Proposition 2.1 Let us give an example in which the ambient group is Z, and 
where we use additive rather than multiplicative notation. Take A = {1, . . . , n} U 
{2™ +1 , 2 n+2 , . . . , 2 2 ™}. Then it is easy to check that (3) and (4) are both satisfied 
with any K > 12, as n — > 00, this being because there are |^ 3 (1 + o(l)) solutions 
to 01 + 02 = 03 + 04 with ai, 02, 03, 04 € {1, ... , n}. On the other hand the sumset 
A + A contains the numbers 2 n+l + j for each pair i,j with < i,j ^ n. Since 
these numbers are all distinct, we have |^4 + A| ^ n 2 = \A\ 2 /2, which means that if 
n is sufficiently large depending on K then (2) is not satisfied at all. 

Rather remarkably, however, there is a sense in which the concepts (1) - (6) are 
all roughly the same. To say what we mean by that, we introduce the following 
notion of rough equivalenc^] 

Definition 2.2 (Rough Equivalence). Suppose that A and B are two finite sets 
in some ambient group and that K ^ 1 is a parameter. Then we write A 
B to mean that there is some x in the ambient group such that \A n Bx\ 
max(|A|, \B\)/K . We say that A and B are roughly equivalent (with parameter 
K). 

The remarkable fact alluded to above is the following. For every choice of j,f <E 
{1,...,6}, suppose that some set A satisfies condition (j) in the list above with 
parameter K . Then there is a set B satisfying condition (j') with parameter K' = 
poly(ii') (some polynomial in K) such that A ~ K , B. Of particular note is the 
fact that the weak "statistical" properties (3) - (6) imply the apparently more 
structured properties (1) and (2). The proof of this is not at all trivial and the main 
content of it is the so-called Balog-Szemeredi-Gowers theorem [22 , generalised to 
the nonabelian setting in the fundamental paper of Tao , as well as a collection 
of "sumset estimates" which, in the abelian case, I refer to collectively as Ruzsa 
calculus [21] • These estimates of Ruzsa have such a classical role in the theory that 
we record two of them, in the abelian setting, here: we will mention these two again 
later on. 



world" although the parameter K could be anything between 2 (say) and some small power of 
|A|. 

^The fact that we have written Bx rather than xB is a little arbitrary. The notion of rough 
equivalence will, in this survey, be applied to classes of sets (such as (1) - (6) here) which are 
invariant under conjugation, in which case whether we multiply on the left or on the right in the 
definition makes little difference. 
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Theorem 2.3 (Ruzsa). Suppose that A±, A2 and A3 are finite sets in some ambient 
abelian group. Then {A^A? - A 3 \ ^ \A X - A 2 \\A X - A 3 \ and \A X + A x \ sC \A X — 

MV\M 2 - 

The original paper [JT], the book [S3] or the notes [2SJ may be consulted for 
more details. The first estimate is true in general groups but adapting the second 
requires care: see [59] . 

It might be remarked that for many pairs (J) and (j 1 ) the correspondence between 
the relevant properties is a little tighter than mere rough equivalence, and often 
this can be useful. We shall not dwell on this point here. In the paper of Tao just 
mentioned one finds what has become the "standard" notion of an approximate 
group. 

Definition 2.4 (Approximate group). Suppose that A is a finite subset of some 
ambient group and that K ^ 1 is a parameter. Then we say that A is a K- 
approximate group if it is symmetric (that is, if a G A then a -1 G A, and the 
identity lies in ^4) and if there is a set X in the ambient group with \X\ ^ K and 
such that A - AC X ■ A. 

This notion, it turns out, is roughly equivalent to (1) - (6) above. It has certain 
advantages over (1) - (6), for example as regards its behaviour under homomor- 
phisms. It is also clear that an approximate group in this sense enjoys good control 
of iterated sumsets. Thus, for example, A ■ A ■ A G X ■ X ■ A, which means that 
\A 3 \ = \A ■ A ■ A\ < K 2 \A\, and similarly \A n \ < K^ 1 ^ where A n denotes the set 
of all products a x . . . a n with a±, . . . , a n G A. From now on, when we speak of an 
approximate group, we will be referring primarily to Definition |2.4| 

With this discussion in mind, we can introduce what might be termed the rough 
classification problem of approximate group theory. 

Question 2.5. Consider the collection C of all iT-approximate groups A in some 
ambient group G. Is there some "highly structured" subcollection C such that 
every A G C is roughly equivalent to some set B € C with parameter K' , where K' 
depends only on K7 

This question has been addressed in a great many different contexts, starting 
with the Freiman- Ruzsa theorem US], which gives an answer for subsets of 
Z. Here, it is possible to take C to consist of the so-called generalised arithmetic 
progressions, that is to say sets B of the form 

B := {hxi + ■■■ + l d x d : k G Z, \k\ sC £,<}, 

where x x , . . . , G K, the quantities L\, . . . , are "lengths" and d ^ K. Note in 
particular that, even in the highly abelian setting of the integers Z, approximate 
groups are a more general kind of object than genuine subgroups. That is, the 
theory of approximate groups, even up to rough equivalence, is a little richer than 
the theory of finite subgroups of Z (which is in fact a rather trivial theory) . The 
remarkable feature of the Frciman-Ruzsa theorem is that the theory is not much 
richer, in the sense that generalised progressions remain highly "algebraic" objects. 
Here is a list of other contexts in which the question has been at least partially 
answered: 

• abelian groups [3D]; 



APPROXIMATE GROUPS AND THEIR APPLICATIONS 



5 



• nilpotent and solvable groups [T0 | HT j H9 l 150 ] l6T] : 

• free groups |46j ; 

• linear groups SL 2 (IR) [TB] . 

sl 2 (C) H3H22HSI, 
SL 3 (Z) pa, 

SL 3 (C) (sketched in [34]), 

"bounded" subsets of SL n (C) including U„(C) [T2"] . 
SL 2 (F P ) [33], 

sl 3 (f p ) m 

and SL^Z/gZ) for various other q (cf. jl]). 

It is generally felt that approximate groups in quite general contexts can be 
controlled by objects built up from genuine subgroups and nilpotent objects; this 
has been found in all of the examples just mentioned and is suggested by the 
famous theorem of Gromov on groups with polynomial growth 32] and the recent 
quantitative formulation of it due to Shalom and Tao |55j . Quite precise suggestions 
along these lines have been made by Helfgott, Lindenstrauss and others: more 
information on this can be found on Tao's blog [52"] . 

Before leaving this subject, we remark that even (perhaps especially) in the 
abelian case the issue of the dependence of K' on K is far from being resolved. No 
examples are known to rule out the possibility that, with the right definition of the 
highly-structured class C, K' can be taken to be polynomial in K. In particular 
this is conjectured when the ambient group G is Ff , the countable infinite vector 
space over the field of two elements, and C consists of (finite) subgroups of G. 
This assertiorj^] is known as the polynomial Freiman-Ruzsa conjecture |49j . see also 
[25} 127] . It is equivalent to the following question which, for many years, I have 
tried to advertise to those for whom the word cohomology holds no fear. 

Question 2.6. Suppose that cj> : F 2 — > Ff is a map such that (j>(x + y) — (f>(x) — 4>(y) 
takes on at most K different values as x, y range over F 2 . Is it true that <f> = <j) + rj, 
where <p is linear and r\ takes on at most K' = poly(ii') different values? 

It is a very easy exercise to obtain such a statement with K' = 2 K but, so far 
as I know, no serious improvement of this bound has ever been obtainecj^j 

3. Approximate rings and fields 

Fortified by the experiences of the last section, one might attempt to come up 
with a sensible notion of an approximate ring. A natural one, based perhaps on (2) 
in the previous section, is as follows: if A is a finite subset of some ambient ring R, 
we say that it is a i-T-approximate ring if \A + A\ ^ K\A\ and ■ ^ Here, 
of course, A + A := {ai + a 2 : ax, a 2 G A} and A - A = {aia 2 : a±, a 2 G A} as before. 
If R = F is actually a field (or an integral domain, which embeds into its field 
of fractions) then we refer to A as an approximate field, noting that approximate 
closure under division is essentially automatic in view of the rough equivalence of 
the notions (1) and (2) of approximate group. 



'There are variants of this conjecture over other groups, such as Z; see |23l 1311 . 
'I would be very interested to see even a bound of the form 2°^' . 



6 



BEN GREEN 



The study of approximate rings and fields was initiated in a paper of Erdos and 
Szemeredi [T7] who proved (though not in this language!) that a iC-approximate 
subfield of Z must have size poly(K). They in fact conjectured that the right 
bound is C e K 1+e for any e > 0, but this is so far unresolved; the best exponent so 
far obtained is 3 + e, a result of Solymosi [57 . Note that this is equivalent to, and 
more usually stated as, the lower bound 

max{\A + A\,\A-A\) ^ c t \A\^ 3+e 

for all finite sets A C Z. In a different paper [56 , Solymosi generalised the Erdos- 
Szemeredi result to show that every if-approximate subfield of C has size at most 
2 12 K\ 

The general theory of approximate fields can be said to have started with the 
papers of Bourgain-Katz-Tao [8] and Bourgain-Glibichuk-Konyagin [7J[9], wher^] 
the following result is established. 

Theorem 3.1. Let p be a prime and let K ^ 2. Then every K- approximate subfield 
of¥ p has size at most K or at least K~ c p, for some absolute constant C. 

The arguments on page 384 of [7], though they are phrased in a more limited 
context, essentially prove that every approximate subfield (in an arbitrary ambient 
field) must be roughly equivalent to a genuine finite subfield. This unifies the results 
of Erdos-Szemeredi and Solymosi with Theorem |3.1| In fact something similar is 
true for approximate rings, at least provided the ambient ring R does not have "too 
many" zero divisors. These issues are comprehensively explored in an interesting 
paper [60 of Tao, which also has a very comprehensive collection of references. 

Suppose that A is a ^-approximate field in some ambient field F, that is to 
say both \A ■ A\ and \A + A\ are bounded by -ftT|^4|- We are going to sketch a 
proof that F must contain a genuine subfield B which is "close" to A. The first 
step is to prove the Katz-Tao lemma, which asserts that A (or, more precisely, a 
large subset A' C A) behaves in a manner which more strongly resembles that 
of a field: that is to say, A is almost closed under both addition/subtraction and 
multiplication/division simultaneously. To give a (relevant) example, the set 

A := {a 5 + a 6 : a%, . . . , a e S A} 

oa — a 2 

has size -f^|^4|, where K = poly(if). 

A slick proof of the Katz-Tao lemma is given in [BHl Section 2.5] and we shall 
say little more about it here other than to remark that it involves a combination 
of Ruzsa's sumset calculus and clever elementary arguments. Personally, I regard 
it as part of the "basic" theory of approximate fields as opposed to the "structural 
theory" , to be regarded on the same level as the arguments used to show that def- 
initions (1) - (6) of an approximate group are roughly equivalent (namely, Ruzsa's 
sumset calculus and the Balog- Szemeredi- Gowers theorem). In other words one 
might argue that the smallness of A, or of similar objects, might be taken as an 
alternative definition of approximate field. 

^The original paper [8] of Bourgain, Katz and Tao did not quite classify the very small (smaller 
than p s ) approximate subrings of ¥ p ; this restriction was removed in [§]. Very often the 
approximate fields under consideration in a given setting will have size at least p s , and for this 
reason one often refers to the Bourgain-Katz-Tao theorem. 
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Suppose that A is known to have this property, that is to say \A\ ^ K\A\ 
it is possible to establish an intriguing dichotomy: if £ £ F x then either 



Then 



(3.1) 



or 



(3.2) 



\A + A(\ = \A\' 



\A + A£\ < K\A\. 



Here, A + At; refers to the set of all a\ + with ai, a 2 € A. To see why this is 
so, note that \A + A£\ ^ \A\ 2 and that equality occurs if and only if the elements 
a i + a 2^ are an distinct. If equality does not occur then we may find a nontrivial 
solution to ai + a 2 £, — + a^, which means that £ = . But then every 

element of A + At; has the form 



«5 + ^6^ = 0,5 + a& - 



a-i 



«3 



a 4 - a 2 



and thus lies in A. 

On the other hand, it is not hard to see using Ruzsa calculu^Jthat if £1,^2 satisfy 
(Hjt then for £ = £1 + £>, £1 - 6, £i6> Ci^ 1 w e have 



for absolute constants C,C. If X is a sufficiently small power of |^4| then this 



means that (3.1l cannot hold, forcing us to conclude that (3.2 1 holds for £. In 
this way we identify the sein of all £ satisfying (3.2 1 as a genuine subficld of F. 



Straightforward additional arguments allow one to show that this subfield and F 
are roughly equivalent. 



The original argument of [8 is different and specific to F p but rather fun and, 
given the preceding discussion, it is not hard to say a few meaningful words about it. 
Suppose for the sake of illustration that A C F p is a if-approximate subfield of size 
~ p 1 / 10 ; our task is to derive a contradiction if (say) K = p°^ x \ Suppose that the 
Katz-Tao lemma has already been applied, so that A, as defined above, is known to 

be small. The sets A, A, . . . arising from (boundedly many) successive applications 
of this operation may also be shown to be small. Now simple averaging arguments 
(using nothing more than the fact that \A\ — p 1 / 10 ) show that F p has dimension 
at most 100 (say) as a "vector space" over A; that is, there exist x\,. . . ,2100 G F p 
such that 

(3.3) ¥ p = Axi-\ \-Ax W0 - 

Now x\, ... , X100 cannot be a "basis" for ¥ p over A since otherwise we would have 
p = I A 1 100 , contrary to the assumption that p is prime. Thus there must exist some 
x G¥ p which is representable in two different ways as 

x = a x xi H h aioo^ioo = a'\Xi H 1- a'loo^ioo 



In addition to the bounds of Theorem 2.3 
in terms of the \A% + Aj\. 

^Note that this set may be identified with 



one requires an inequality controlling \Ai + A2 +A3I 



A — A 

(A-A)> 
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with ai, . . . , aioo, a[, . . . , a' 100 <E A. Suppose, without loss of generality, that aioo 7^ 

(ai — a'i)Xi H h (a 99 - a'g^ccgg 



a' 100 . Then 



^100 



l ioo ~ a ioo 



By substituting this expression for xioo into (3.3), we see that 

F p = Ax\ H h Ae 99 . 

Repeating the argument gives (without loss of generality) 

F p = + \-Ax 98 , 

and we may continue in this fashion to get, eventually, 

F p = Axi. 

This is contrary to the fact that none of the sets A, A, . . . has size much larger than 
that of A itself, namely about p 1 / 10 , and a contradiction ensues. 

Remarkably, the main "dimension reduction" idea here comes from a paper in 
point-set topology, namely Edgar and Miller's solution of the Erdos-Volkmann ring 
problem [TS] (that is, the statement that all Borel subrings of K have dimension 
or 1). See in particular Lemma 1.3 of that paper. 

4. Helfgott's results 



In this section we discuss the results of Helfgott |33j |34j concerning approximate 
subgroups of 

SL 2 (F p ) := { r : a, b, c,d E W p : ad - be = 1}. 
Helfgott proves the following. 

Theorem 4.1 (Helfgott). Suppose that A C SL2(F p ) is a K -approximate group. 
Then A is roughly K -equivalent to an upper-triangular K -approximate subgroup 
°/SL2(F p ) {that is, an approximate subgroup conjugate to a set of upper-triangular 
matrices) . 

Rather than discuss Helfgott's result itself, we discuss the analogous question for 
SL2(C). Here the answer is rather simpler and is given in [13] , based on Helfgott's 
work. 

Theorem 4.2. Suppose that A C SL/2(C) is a K -approximate group. Then A is 
roughly K c -equivalent to an abelian K c -approximate subgroup o/SL/2(C). 

If desired the abelian approximate group could itself be controlled by a gener- 
alised progression using the Freiman-Ruzsa theorem. 

We will only sketch a proof of the weaker result that A is K c -equivalent to an 
upper-triangular i^-approximate subgroup, that is to say the direct analogue of 
Helfgott's result. In SL 2 (C), additional arguments may then be applied to prove 
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Theorem |4.2[ there are no such arguments in SL2(F p ), since the upper-triangular 
"Borel subgroup" 

{ (o A^ 1 ) :AeF > eF ^ 

is not close to abelian. 

The proof of this weak form of Theorem |4.2| is simpler than that of Theorem 



4.1 in two major ways. Firstly since C is algebraically closed we may talk about 
eigenvalues, eigenvectors and diagonalization without the need to pass to an exten- 
sion field, whereas over ¥ p we would have to involve the quadratic extension ¥ p 2 . 
Secondly, the structure of if- approximate subfields of C is easy to describe: by the 
theorem of Solymosi [SB] they are all sets of size at most 2 12 K A . Theorem III] by 
contrast, has to allow for those approximate fields which are almost all of ¥ p . Worse 
still, to handle SL2(F p ) Helfgott must in fact classify approximate subfields of F p 2, 
and this involves the additional possibility of sets which are close to the subfield 
¥ p . 

For the sake of exposition, we will assume in the first instance that A is a 
genuine finite subgroup of SL 2 (C); our task is to show that A contains a large 
upper-triangular subgroup. When we have sketched how Helfgott's argument looks 
in this case we will remark on the additional technicalities required to make the 
argument "robust" enough to apply to if- approximate groups. 

The key idea in Helfgott's argument, referred to by subsequent authors as trace 
amplification, involves examining the set of traces 

trA := {tra : a € A}. 

We will sketch a proof that a large subset of this set of traces is a 2 24 -approximate 
subfield of C of size greater than 2 108 . This contradicts Solymosi's theorem [56] 
and so we must be in one of those degenerate situations. Careful analysis of each 
of them leads to the conclusion that A is roughly upper-triangular. 

The first degenerate situation to analyse is that in which tr A is small, an ap- 
propriate notion of small being |trA| ^ 2 111 . Now a linear algebra computation 
(Lemma 4.2 of [6) confirms that if g,h e A arc elements without a common eigen- 
vector in C 2 then the map 

SL 2 (C) -> C 3 : x (trx,tx(gx),tx(hx)) 

is at most two-to-one. This, or rather the fact that something like this holds, is 
not at all surprising: indeed knowledge of tr(x), tx(gx), tx(hx) together with the 
fact that det(a;) = 1 provides four pieces of information which, generically, ought 
to more-or-less determine the four entries of the matrix x. If A contains two such 
elements g, h then it follows that we have 

\A\ ^ 2|trA| 3 < 2 334 , 

and so \ A\ is also smalp") If, by contrast, A does not contain two such elements, and 
if \A\ > 3, then it is easy to see that there is some v S C 2 which is an eigenvector 
for all of A simultaneously. With respect to a basis containing v, every matrix in 
A is upper-triangular. 



10 Additive combinatorics has a bad reputation for referring to quantities like 2 334 as "small" 
"Bounded by an absolute constant" might be more appropriate. 
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Suppose, then, that \ trA\ > 2 111 . In particular (!) there is some element g € A 
which is non-parabolic, or in other words tr g ^ ±2; such elements have distinct 
eigenvalues and so are diagonalisable. 

Write A' C A for the set of non-parabolic elements; then |trA'| > |trA| — 
2 ^ ||tryl|. Now in SL2(C) the trace of a non-parabolic element g completely 
determines the conjugacy class of g. It follows that there is some non-parabolic 
g 6 A such that the conjugacy class of A containing g has size at most 2|A|/| tr A\. 
By the orbit-stabiliser theorem, the centralise!^ 

T = Ca{9) = {a e A : ag = ga} 
has size at least || tiA\. But by changing basis so that g is in diagonal form (with 
distinct diagonal entries) it is not hard to check that T consists entirely of diagonal 
matrices. No single trace can arise from more than two of these elements, and so 
| tr T| ^ || tr A\ > 2 109 . We shall show that the set 

R := {tra 2 : a E T} 
is a 2 24 -approximate subfield of C. Noting that 

(4.1) \R\ > ||trT| > 2 108 , 

this is contrary to Solymosi's theorem. In order to do this we play around a little 
with traces. Such playing around is most productive if, in the basis just selected, 

there is an element a — ( 011 ° 12 ) G A with 011012021022 7^ 0. The absence of 

\ a 21 022 / 

such an clement is another degenerate situation to analyse, and once again one can 
checkp 2 ] that A must be either upper-triangular or else equal to one of the dihedral 
groups, each of which has an index two abelian subgroup. 

Now let us note that 

(4.2) R-RCR + R, 
this being a consequence of the fact that 

{t\ + q 2 )(t 2 2 + q 2 ) = (tjt 2 + q 2 q 2 ) + (tjq 2 + q 2 t 2 ). 
Let us also note that 

where [i :— 011022 7^ and A := —012021 7^ 0, which means that 

XR + fiR C tr A. 

In particular 

\R+ %R\ = \XR+fiR\ ^ \tiA\ ^ 16\R\, 
A 

which, by Ruzsa's inequalities (Theorem 2.3 applied with Ai — £R and A 2 — A 3 — 
-R) implies that \R + R\ ^ 2 2A \R\. This, together with ([O), implies that R is a 
2 24 -approximate subring of C. By Solymosi's theorem this implies that \R\ ^ 2 108 , 
contrary to (4.1 ). □ 

In the above sketch we assumed, of course, that A was actually a finite subgroup. 
However the argument was of a type that can be made to work for iC-approximate 

T is for torus, the word used for such a subgroup in Lie theory. 
12 This is, admittedly, a somewhat tedious check. 



APPROXIMATE GROUPS AND THEIR APPLICATIONS 



11 



groups also. To explain what we mean by this let us remark, rather vaguely, on 
how one or two of the steps adapt and then offer some general remarks. 

Orbit-Stabiliser theorem. If A is a group and if x £ A then we used the fact 
that the size of the conjugacy class S(x) containing x and that of the centraliser 
Ca(x) are related by |E(x)||Ca(2;)| = \A\. In fact we only used the inequality 
|Ca(<e)| ^ |.A|/|S(x)|, giving us an element with large centraliser, and here is a 
simple way of seeing why this holds: all of the conjugates axa~ l , a € A, lie in S(x), 
and so by the pigeonhole principle there must be distinct elements ai, . . . , afc G A, 
k ^ |A|/|E(ar)|, with aixa^ 1 = ■■■ = a^xa^ . But then the elements a\, 
i = 1, centralise x. Now if A is only a if-approximate group then this 

argument does not quite work, as there is no well-defined notion of conjugacy class. 
However a similar pigeonhole argument nonetheless gives us an element with large 
centraliser, since the conjugates axa~ l are all constrained to lie in A 3 , a set of size 
at most K 2 \ A \ . 

Escape from subvarieties. A more interesting point concerns the location of an 
clement of A which, in a given basis, has no zero entries. Whilst this might not be 
a priori possible if A is only an approximate group, it is possible to find such an 
element in A n for some bounded n (independent of the approximation parameter 
K), and this is good enough for Helfgott's purposes. This is a special case of a nice 
lemma of Eskin, Mozes and Oh [18] called "escape from subvarieties" . The point 
is that the group (A) generated by A, if it is not almost upper-triangular, contains 
an element with no zero entries - indeed this fact was used in the above sketch. In 
other words, (A) is not contained in the subvariety of SL2(C) defined by 



The Eskin-Mozes-Oh result states that in such a situation we can find "evidence" for 
the non-containment of (A) inside V by taking just a bounded number, depending 
only on V, of products of A. 

It seems, then, that certain types of argument - in some sense those involving 
"bounded length" computations in the ambient group - adapt very well from the 
traditional group theory setting to approximate groups. At the moment we do 
not have anything approaching a precise formulation of this principle and indeed 
at present the passage from the "exact" to the approximate is as much an art as 
a science. Nonetheless, there seems to be merit in looking for "bounded length" 
proofs in traditional group theory which might be adapted to the approximate 
setting. Perhaps this is as good a place as any to mention the remarkable recent 
paper of Hrushovski |36j in which tools from model theory have been applied to 
the study of approximate groups. The ramifications of that paper are not yet 
completely clear, but it looks as though Theorem 1.3 of that paper together with 
some structure theory of algebraic groups ought to lead, without too much difficulty, 
to a proof of the following statement. 

Conjecture 4.3. Suppose that A C SL„(C) is a i^-approximate group. Then 
there is a if'-approximatc group B which is nilpotent and if' -controls A, where K' 
depends only on K. 




It seems reasonable to conjecture that K' can be taken to depend polynomially 
on K, although in their present form Hrushovski's techniques will not give this. 
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5. Cayley graphs on SL 2 (F p ) 

We move on now to applications of the theory of approximate groups. In this 
section we discuss the paper [3] of Bourgain and Gamburd. This paper concerns 
expander graphs. For the purposes of this discussion these are 2fc-regular graphs 
r on n vertices for which there is a constant c > such that for any set X of at 
most n/2 vertices of T, the number of vertices outside X which are adjacent to X 
is at least c\X\. Expander graphs share many of the properties of random regular 
graphs, and this is an important reason why they are of great interest in theoretical 
computer science. There are many excellent articles on expander graphs ranging 
from the very concise |51j to the seriously comprehensive |35j . 

A key issue is that of constructing explicit expander graphs, and in particular 
that of constructing families of expanders in which k and c are fixed but the number 
n of vertices tends to infinity. Many constructions have been given, and several 
of them arise from Cayley graphs. Let G be a finite group and suppose that 
S = {<7i , . . . , g^ 1 } is a symmetric set of generators for G. The Cayley graph 
C(G, S) is the 2fc-regular graph on vertex set G in which vertices x and y are joined 
if and only if xy^ 1 € S. Such graphs provided some of the earliest examples of 
expanders [HJU2]. A natural way to obtain a family of such graphs is to take some 
large "mother" group G admitting many homomorphisms ir from G to finite groups, 
a set S C G, and then to consider the family of Cayley graphs C(tt(G),tt(S)) as it 
ranges over a family of homomorphisms. The work under discussion concerns the 
case G — SL 2 (Z), which of course admits homomorphisms n p : SL 2 (Z) — > SL 2 (F p ) 
for each prime p. For certain sets SCG, for example 

»-<; if -G ?f > 

or 

*-<G if 'G :f >■ 

spectral methods from the theory of automorphic forms may be used to show that 
(C(tt p (G) , TT p (S)))p prime is a family of expanders. See [40] and the references therein. 
These methods depend on the fact that the group (S) has finite index in G — SL 2 (Z) 
and they fail when this is not the case, for example when 

m !f >• 

In [ID] Lubotzky asked whether the corresponding Cayley graphs in this and other 



cases might nonetheless form a family of expanders, the particular case of (5.1) 
being known as his "1-2-3 question". The paper of Bourgain and Gamburd under 
discussion answers this quite comprehensively, showing that all that is required is 
that the group generated by S is not virtually abelian (contains a finite index abelian 
subgroup) . We will sketch the proof in the case that S generates a nonabelian free 
subgroup of SL 2 (Z). This is essentially the most general case, since the kernel of 
the natural homomorpism from (S) to SL 2 (F 2 ) = Sym(3) is free and has index at 
most 6 in (S). 
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Theorem 5.1 (Bourgain - Gamburd). Let G = SL 2 (Z) as above and suppose that 
S is a finite symmetric set generating a free subgroup o/SL 2 (Z). Then 

(C(w p (G), ir p (S)))p prime 

is a family of expanders. 

The notation we have introduced here is rather cumbersome, so let us write 
T p := C(n p (G), Wp(S)). For concreteness we will focus on the special case S = 

{A, A~\B, B- 1 }, where A = ^ and B = Q are the matrices relevant 

to Lubotzky's 1-2-3 question. The argument is almost identical in any other case. 
In this case, then, T p is the graph on vertex set SL 2 (F p ) in which x is joined to y 
if and only if xy^ 1 is one of the elements A, A^ 1 ,B or B^ 1 considered modulo p. 
Supposing that p > 3, each of these graphs is 4-rcgular. The number of verices in 
r p isn:=|SL 2 (F p )|=p(p 2 -l). 

The reader may be interested to see a proof, using the "ping-pong" technique of 
Felix Klein, that that the subgroup of SL 2 (Z) generated by these A and B is indeed 
free. Consider the natural action of A and B on the projective plane P 1 (Q). Write 

X:={{\: 1) eP^Q) : |A| < 1} 

and 

r:-{(l:A)eP 1 (Q):|A|<l}, 
and observe that X and Y are disjoint and jouent au ping pong, that is to say 
A n (X)CY for all n E Z \ {0} 

and 

B n (Y)CX for all n e Z \ {0}. 

(The origin of the name should be clear - the "players" A and B hit the domains 
X and Y back and forth - as should the preference for the French term rather 
than the cumbersome "play table tennis with one another".) If the group gen- 
erated by A and B is not free, then some nontrivial reduced word in A and B 
is equal to the identity, where "reduced word" means a finite word of the form 
. . . A mi B ni . . . A mk B nk . . . with mi , m , . . . , m k , n k ^ 0. The conjugate of such a 
word by an appropriate power of A will still be the identity and will now have the 
form w = A mi B ni . . . A mk B nk A mk+1 with mi,rij ^ 0. However by repeated appli- 
cation of the ping-pong properties we see that w(X) C Y, certainly an impossibility 
since X and Y are disjoint and w is supposed to be the identity. 

Following that slight digression let us focus once again on the Cayley graphs T p , 
our aim being to prove that they form a family of expanders as p ranges over the 
primes. To do this we begin by giving a spectral interpretation of the expansion 
property which we defined combinatorially above. For each p we may consider the 
Laplacian of the corresponding Cayley graph, that is to say the operator 

A : L 2 (SL 2 (F p )) -» i 2 (SL 2 (F p )) 

defined by 

Af(x) := f{x) - \(f(Ax) + f{A~ x x) + f(Bx) + f^x)). 

The eigenvalues of the Laplacian lie in the interval [0,2]. Zero is certainly an 
eigenvalue, since Al = 0. Write the eigenvalues in ascending order as = A ^ 
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Ai . . . ^ A rl _i. It turns out the expansion properties of the graph T p (in fact of 
any regular graph) are intimately connected with the size of the second-smallest 
eigenvalue Ai = Ai(r p ). The precise relation between the combinatorial property 
of expansion and this spectral property is discussed in Section 2 of [35] . but for 
our purposes we need only remark that it suffices to show that the second-smallest 
eigenvalue Ai(r p ) is bounded away from zero independently of n (in fact, this is 
also a necessary condition for expansion) . The term spectral gap is used to describe 
this property: there is a gap at the bottom of the spectrum in which there are no 
eigenvalues apart from zero. 

To try to show that there is a spectral gap, consider the operator 

T : L 2 (SL 2 (F p )) - L 2 (SL 2 (F p )) 

given by T :— 4(id — A), that is to say 

Tf(x) := f(Ax) + f(A~ 1 x) + f(Bx) + /(B^z). 

The matrixp*] of T is same thing as the adjacency matrix of the graph T p , that is 
to say the matrix whose xy entry is 1 if x ~ y and zero otherwise. The eigenvalues 
of T are of course fii = 4(1 — A{), i = 0, . . . , n — 1, and it is a very well-known and 
easy to establish fact that the 2mth moment Yli=o M? m is equal to n times W^mi 
the number of closed walks of length 2m from the identity to itself. It follows that 
we have 

(5.2) W 2m = U 2m (l + - A,) 2 " 1 ^ . 

Note in particular that M^m ^ l4 2m , since all the terms are non-negative. At first 



glance it looks as though the only way to use (5.2 1 to bound Ai away from zero 
would be to get rather precise estimates on W2 m , and in particular one would at the 
very least want to show that W% m < -4 2 ™ 1 . However a remarkable observation, used 
earlier in related contexts by Sarnak and Xue [54] and Gamburd |21j . comes into 
play. This is that any eigenspace of the Laplacian is SL 2 (F p )-invariant, where the 
action of SL 2 (F p ) on L 2 (SL 2 (F p )) is the right-regular one given by gof(x) :— f(xg). 
In other words, any such eigenspace has the structure of a representation of SL 2 (F p ) 
and thus, by basic representation theory, decomposes into irreducible representation 
of SL 2 (F p ). But by a classical theorem of Frobenius all such representations have 
dimension at least (p — l)/2 ~ n 1 / 3 . This means that Ai = A 2 = • • • = A/ for some 



I ~ n 1 / 3 , and hence from (5.2) we in fact have the bound 
(5.3) W 2m » ^4 2m (l - Ax) 2 " 1 . 

Tl ' 

This enables a meaningful spectral gap (lower bound on Ai) to be obtained from 
somewhat weaker upper bounds on W 2m . 

The main new content of |3], then, is to obtain those upper bounds on W2m> the 
number of walks of length 2m starting and finishing at the identity, for appropriate 
to. A nice way of thinking about these walks is in terms of convolution powers of 
the probability measure 

v := \{8 A + S A -i +S B + S B -i) 



13 With respect to the basis of SL2(F P ) consisting of the functions It : SL2(Fp) — * C defined 
by lt(^) = 1 if x = t and otherwise. 
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on SL^Fp), where 5 g (x) = n if x = g and otherwise. This measure v is a very 
singular or "spiky" probability measure, supported on just the four points A, A~ l , B 
and B~ l . Now the convolution 

j/ 2 ) : = v * v {x) := E yeSh2{¥p) iy(xy^ 1 )i'(y) 

is supported on words of length at most two in A, A^ 1 , B and B^ 1 , or alternatively 
those x in the graph T p which can be reached from the identity by a path of length 
two, the value of v * v(x) being 4~ 2 n times the number of paths of length two from 
the identity to x. Similarly higher convolution powers v^ m \x) :=!/*•■•* v[x) give 
Ar m n times the number of paths of length m from the identity to x. The idea 
of the proof is to examine these convolution powers, showing that they become 
progressively more "spread out" until, for suitable m, z/ 2m ' vaguely resembles the 
uniform measure 1 which assigns weight one to each point of SL2(F p ). Then, in 
particular, i/( 2m )(0) - 1, meaning that Wi m ~ 4 2m /n. Combined with ( |5.3| , this 
is enough to establish the desired spectral gap. 

The notion of a probability measure /i on SL2(F p ) being "spread out" may be 
quantified using the L 2 -norm 

IHIa := (E ieSL2(Fp) ^(>) 2 ) 1/2 . 

The L 2 -norm of a delta measure 5 g is n 1 / 2 , which is huge, whilst that of the uni- 
form measure 1 is equal to one, the smallest value possible by the Cauchy-Schwarz 
inequality. It is not hard to show that convolution cannot increase the L 2 -norm, 
and so we have the chain of inequalities 

(5.4) n l ^=\\v^\\ 2 ^\\v^\\ 2 ^.... 

The aim is to show that this sequence is, in fact, rather rapidly decreasing. Roughly 
speaking one shows that 

(5.5) |k (mi) || 2 «l 

for some mi w C\ log p; this mj turns out to be an appropriate choice to substitute 
into (5.3) in order to reach the desired conclusions. 

It turns out that this sequence gets off to a rather good start. This is a con- 
sequence of an observation of Margulis [43], namely that the freeness of the sub- 
group of SL 2 (Z) generated by A and B persists to some extent even when re- 
duced modulo p. Indeed let us take a reduced word w — A mi B ni . . . A mk B nk with 
mi, . . . , m,fc, tii, ■ ■ ■ , n k 7^ and suppose that this equals the identity when reduced 
modulo p, that is to say in SL2(F p ). Lifting back up to SL 2 (Z) we have 

w = A mi B ni . . . A mk B ,lk = id (mod p) . 

But the freeness of the lifted group means that w ^ id, and thus in order to be 
congruent to the identity mod p the matrix w must have at least one entry of size 
at least p — 1. But by some simple matrix inequalities this is impossible provided 
that 

I mi I + |ni| H h \m k \ + \n k \ < clogp 

for some absolute constant c > 0. 

It follows that the subgroup of SL 2 (F p ) generated by A and B is "free up to 
words of length c log p" . In terms of the Cayley graphs T p this means that up to 
retracing steps there is a unique walk of length m between the identity and x for 
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any x G SL2(F p ) and for any m ^ mo := clogp/2. This implies that the measures 
z/ m ), m ^ to are already rather spread out. To quantify this (and in particular 
to deal with the issue of "retracing steps" ) a result of Kesten concerning random 
walks in the free group may be applied. The conclusion is that 

(5.6) ||^ mo) || 2 «n 1 / 2 -T 



for some 7 > 0. This is good progress on the way to ( 5.5 1 and represents a significant 
improvement on the initial bound ||f^||2 = n 1 ^ 2 . 

It is convenient to imagine, for the rest of the argument, that all probability 
measures /x on G have the form fi(x) = r^r 1a(x) for some set A C G, the "support" 
of /1. Whilst this is clearly not true, various (somewhat technical) decompositions 
into level sets may be used to reduce to this case. For such a measure we have 

|| M || 2 = {n/\A\Y'\ 



and so the bound (5.6 1 corresponds to |^4| ^ 71 27 , certainly a reasonable level of 
spreadoutness. 

The rest of the argument, which constitutes the heart of the paper, involves 
examining the convolution powers between z/ m °) and z/" 11 ' for a suitable toi ~ 
C\ logp, the aim being to establish (5.5). An application of the "dyadic pigeonhol- 
ing argument" , used to great effect by Bourgain in many papers, is employed: if 
||^( m i)|| 2 is much larger than 1, this means that the sequence ( 5.4 1 cannot decay too 
rapidly between z/ m °) and i/ 1711 ) and so there must be two convolution powers i^ m ) 
and v( 2m \ to < to < mi, such that ||i^ 2m )||2 ~ ||^ m ^||2- Let us be deliberately 
vague about the meaning of w here. 

Suppose that v^ m \x) = t%1a(x) for some set A C G. Noting that i/ 2 " 1 ' = 
v i m ) ^ j/ m ) ; it is not hard to compute that the ratio 

11^)111/11^)112 

is actually equal to |^4.|~ 3 times the number of quadruples 0,1,0,2,0,3,0,4 € A 4 with 
aia 2 — 0,304. This may be compared with condition (4) in the list of properties 
which are known to roughly characterise approximate groups. Thus, being even 
rougher at this point, 

(5.7) „C"0 ~ L 1h 

for some approximate group H C SL2(F p ). Note that the rough equivalence of 
(4) and other, more flexible definitions such as Definition 2.4 is one of the deeper 
equivalences mentioned in Sj2] being reliant on the nonabelian Balog-Szemeredi- 
Gowers theorem of Tao 1 59 . 



If H is already all of SL 2 (F p ) then (5.7 1 is telling us that i/ m ) is close to the 
uniform distribution, in which case so is v^ mx \ hence (5.5 1 is established and we 



are done. If not then we apply Helfgott's result, Theorem |4.1| to conclude that 
H is essentially upper-triangular, and hence that i/ m °) has significant mass on an 
upper-triangular subgroup of SL2(F p ). 

The support of v^ m °\ however, consists of words of length at most too in the 
generators A, ^4~ 1 ,_B and B" 1 and, as we stated, these elements behave freely up 
to words of this length. This is highly incompatible with upper-triangularity, which 
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in particular implies that we always have the commutator relatiorp*] 
(5-8) [[91,32], [33,34]] = id. 

A pleasant group-theoretic argument formalises this incompatibility and allows one 
to show that any set of words of length at most m in the generators A, A -1 , B 



and B~ x satisfying (5.8 1 has size at most m(j. This represents a tiny proportion of 
the set of all such words, which (counted with multiplicity at least) has cardinality 
4 m ° . This contradiction finishes the sketch proof of Theorem |5.1| □ 



Before moving on, we wish to record, for use in the next section, a further 
observation concerning the measures v^ m >. We sketched a proof that ||^ mi '||2 ~ 
1 for some mi ~ Cilogp, that is to say i/( mi *> vaguely resembles the uniform 
distribution on SL^Fp). By taking further convolutions and using the fact that 
irreducible representations have large degree once more, this may be bootstrapped 
to show that i/ m ) becomes exponentially well uniformly-distributed: 

(5.9) v {m \x) = l + 0(ne- cm ) 

for some absolute c > and for all m. Alternatively, such a statement can be 
deduced directly from the spectral gap property, as is done for example in J5J §3.3]. 

It is interesting to ask whether the arguments might adapt to deal with Cayley 
graphs on SL„(F p ) with n ^ 3. A recent paper of Bourgain and Gamburd [5 shows 
that this is the case when n = 3. The argument is, in large part, quite similar to 
the above, except of course that Helfgott's theorem on approximate subgroups of 
SL 2 (F p ) must be replaced by his more difficult result |34j on approximate subgroups 
of SL/3(Fp). There is one significant extra difficulty, however, which is that there are 
proper subgroups of SL3(F p ) which are not close to upper-triangular, an obvious 
example being a copy of SL^Fp). To deal with this a deep algebro-geometric 
result of Nori 45J is brought into play, which states that any proper subgroup of 
SL 3 (F p ), p sufficiently large, must satisfy a non-trivial polynomial equation. To 
obtain a contradiction, it must be shown that the set of words of length mo in the 
generators A and B (say) does not concentrate on the corresponding subvariety 
of SL,3(C), and here techniques from the theory of random matrix products and a 
certain amount of "quantitative algebraic geometry" are brought into play. 



6. Nonlinear sieving problems 

In this section we discuss work of Bourgain, Gamburd and Sarnak [6]. The goal 
of sieve theory, traditionally viewed as a part of analytic number theory, is to find 
prime numbers or at least to say something about them. Historically, the sieve 
arose through work of Brun and Merlin on the twin prime problem, that is to say 
the problem of finding infinitely may primes p such that p + 2 is also prime. Whilst 
this remains a famous open problem, approximations to it have been found. For 
example, Brun established the following result. 

Theorem 6.1 (Brun). There are infinitely many integers n such that n(n + 2) has 
at most 9 prime factors. 



In other words, upper-triangular subgroups of SL2(F P ) are 2-step solvable. 
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Much later, Chen [14j replaced 9 by 3. One way of stating this type of result is 
as follows: there are infinitely many n for which both n(n + 2) is a 3-almost prime, 
that is to say a positive integer with at most 3 prime factors. 

The aim of [6j is to discover almost primes in more exotic locales, and specifically 
in orbits of linear groups. We will sketch a proof of the following result. 

Theorem 6.2 (Bourgain-Gamburd-Sarnak). Let A and B be two matrices in 
SL/2(Z) generating a free subgroup. Then there is some r such that this group 
contains infinitely many r-almost prime matrices (matrices, the product of whose 
entries is r-almost prime). 

Henceforth we shall say "almost prime" instead of "r-almost prime for some 
r" . We remark that in the specific case we focussed on in the last section, when 

A = ( \ | and B = \ \ ? |, the theorem as stated follows from classical sieve 



1 J \3 1, 

theory of the type used to prove Brun's theorem. Indeed (for example) A n BA = 

9n + 1 ^ , and the product of the entries here is 2-3 2 -5- (9n+l) • (10n+l), 

which will be almost prime for infinitely many n by a simple variant of Brun's 
analysis. The issue here is that the subgroup generated by A and B contains 
unipotent elements (in this case both A and B are themselves unipotent). 

We start with a (very) elementary discussion of what a sieve is. Suppose one has 
a finite set X of integers and that one wishes to find primes or almost primes in X. 
The most naive way to do this would be to try to adapt the sieve of Eratosthenes, 
using the inclusion-exclusion principle to compute 

#{primcs in X} = \X\-\X 2 \-\X 3 \-\X 5 \- ■ .+|X 6 | + |X 10 | + |X 15 |+- • -|X 30 |-. . . 

where X q is the set of elements of X which are divisible by q. Unfortunately it 
is well-known that, even when X is an extremely simple set such as {1, . . . ,n}, it 
is not generally possible to evaluate \X q \ sufficiently accurately to avoid the error 
terms in this long sum blowing up. In this simple case just mentioned, for example, 
we have \X q \ — [n/q\. However the floor function is rather unpleasant and it is 
tempting to write instead \X q \ = n/q + 0(1), but then one finds that there are so 
many 0(1) errors that the sieve of Eratosthenes becomes useless. 

By and large, sieve theory is concerned with what it is possible to say about 
primes or almost primes in X given "reasonably nice" information about the size 
of the sets X q . Although the sieve of Eratosthenes is bad, other sieves fare rather 
better. These other sieves are generally cleverly weighted versions of the sieve of 
Eratosthenes, but we will not dwell upon their construction here. A typical example 
of "reasonably nice" information about \X q \ would be 

\X q \=/3(q)\X\+r q 

for all squarefree q ^ |A| 7 , where j3(q) is some pleasant multiplicative function and 
the error r q is small in the sense that \r q \ <C {Xl 1 ^ 7 for some 7 > 0. For example, 
if X = {1, . . . , n} then this is true with /3(g) = 1/5 and for any 7 1. 

The fundamental theorem of the combinatorial sieve states, roughly speaking, 
that such information is enough to find almost primes in X; in fact, one can even 
estimate the number of almost primes. What is meant by "almost prime" - that 
is, how many prime factors these numbers will have - depends on how large we can 
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take 7 as well as on the so-called dimension of the sieve, which has to do with the 
average size of the quantities (3. We will not delay ourselves by expanding upon 
the details here. Let us instead refer the reader to [H] for the precise formulation 
convenient to the application there and to the book |38j or the unpublished notes 
|37j for a more wide-ranging discussion of sieves in general with full proofs. 

All we shall take from the preceding discussion is the notion that, given a finite 
set X to be sieved in order to locate almost primes, we should be looking for good 
asymptotics for the size of the sets \X q \, q squarefree. Returning to Theorem 6.2 



the first obvious question to answer is that of what the set X to be sieved should 
be. The set in which we wish to find almost primes is 

A := {x lX2 x 3 x 4 : ( Xl X2 ) G {A, B)}. 



K x 3 x 4/ 

Now A is of course an infinite set of integers. Rather than truncate in the usual 
way and take X = An {1, . . . , N}, it is much more natural to truncate in a manner 
that respects the group structure more. This we do by taking 

X := {xix 2 x 3 Xi : I ^ ^ 2 I <E £ m (A, £>)}, 

where 

H m (A,B)={U 1 U 2 ...U m : Hi G {A, A~\ B, B' 1 }} 
is the set of words of length rn in A, A -1 , B and B~ x and X is counted with 
multiplicity so that \X\ = 4 m . 

Suppose that p is a prime. Then \X P \ is equal to the number of words w G 
£ m (A, B), counted with multiplicity, which, when reduced modulo p, give rise to a 
matrix in SL 2 (Fp) with at least one zero entry. Writing S C SL 2 (F p ) for the set of 
such matrices, it is easy to compute that 15*1 = 2(2p — l)(p— 1). Now the number 
of words w G £ m (A, B) which reduce modulo p to some x G SL^Fp) is, in the 
notation of the last section, precisely -\X\v^ m '{x), and so 



ix p i = Vi5> (m) (*)- 

n — ' 



n 



However at the end of the last section we sawQjthat v^ m \x) becomes very close 
to 1. In fact, in (5.9 1 we noted the bound v^ m \x) = 1 + 0{ne~ cm ). Using this we 



obtain 

\X p \=(3{p)\X\+r p 
where (3{p) := 2(2p - l)/p(p+ 1) and \r p \ = {Xl 1 ^ for some 7 > 0. 

Thus the expansion property of the Cayley graphs (C(tt p (G), ir p (S))) p pr i mc gives 
exactly the kind of information that can be input into the combinatorial sieve! 

There is, however, a very major caveat. What we have just said applies only to 
X p when p is a prime, and for the sieve one must understand X q when q is a gen- 
eral squarefree number. To do this requires the establishment of Theorem |5.1| for 
the family {C{ / n q {G), n q (S))) q , where now q ranges over all squarefrees and not just 
over primes. The broad scheme of the proof is the same, but every single ingredient 
must be generalised to the more general setting, starting from the classification of 



1 ^Either as a byproduct of the proof, or a consequence, of the expansion property of the family 
of Cayley graphs T p = C(tt p (Q),tt p (S)). 
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FIGURE 1. Apollonian circle packing 



approximate subrings of Z/gZ. The situation here is more complicated because this 
ring will in general have many approximate subrings, namely Z/g'Z with q'\q. One 
of the main technical results of jlj (occupying some 20 pages) is the statement that, 
very roughly speaking, these are the only approximate subrings of Z/qZ. Although 
this is a deeply technical argument of a type that this author would struggle to 
summarise meaningfully even to an expert audience, it might be compared with 
the 92-page proof [5] of the corresponding assertion without the squarefree assump- 
tion on q. Thankfulljj^] this is not required for the present application. Once the 
classification of approximate subrings of Z/gZ for q squarefree is in place a suit- 
able analogue of Helfgott's argument is applied to roughly classify approximate 
subgroups of SL^Z/gZ). Even the statement of this result (Proposition 4.3 in the 
paper) is rather technical. Finally, the majority of the argument outlined in the 
last section in the case q prime goes over without substantial change. 

To conclude this 



This concludes our discussion of the proof of Theorem 6.2 



survey, we wish to mention a beautiful application, mentioned in the original paper 
|6J and in other articles such as |52J, of these nonlinear sieving ideas. This has to 
do with Apollonian packings such as the one in the attractive image above. 



16. 



This is one of the most extraordinarily long and technical arguments the author has ever seen. 
The theory of approximate rings when there are many zero-divisors seems to be very difficult. 
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For a very pleasant and gentle introduction to Apollonian packings, see pQ. 
Referring to Figure [T] inside each circle is an integer which represents the curvature 
of that circle, or in other words the reciprocal of the radius. Some of the number 
theory associated with the integers that arise in this way is discussed in the letter 
|53j where, for example, it is shown that infinitely many of these curvatures are 
prime and in fact that there are infinitely many touching pairs of circles with prime 
curvature. 

Now a pleasant exercise in Euclidean geometry gives a theorem of Descartes, 
namely that the relation between the four integers a±, a 2 , 03, a 4 inside four mutually 
touching circles is given by 



(6.1) 



2(0^ + a 2 + a 3 + a 4 ) = (a± + a 2 + a 3 + a 4 ) z 



Examples of quadruples (a^, a 2 , a 3 , a 4 ) which are related in this way and easily 
visible in the picture are (13, 21, 24, 124) and (13, 24, 37, 156). 

Take a quadruple {C\, C 2 , C3, C 4 ) of touching circles with curvatures 

(01,02,03,04) = (13,21,24,124). 

There is another circle C[ tangent to C2, C3 and C 4 , and it has curvature a[ = 325. 
To find a general relation between ai and a[ we may note that 01,0^ are roots of 
(6.1 1 regarded as a quadratic in oi and thereby obtain the relation 



-a\ + 2o,2 + 2a 3 + 2a 4 . 



This may of course be written as 
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That is, if one starts with some fixed vector such as xo — (13, 21, 24, 124) then one 
may obtain another quadruple of curvatures of circles in the Apollonian packing by 
applying the matrix 



Si := 



with the matrices 



5 3 := 
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we can make the same assertion 
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and 

/TOO \ 
10 
10 
\2 2 2 -I J 

This leads naturally to consideration of the orbit (S)xq C Z 4 , where 

S := {Si, S2, S3, S4}, 

every vector in which consists entirely of curvatures of circles in the Apollonian 
packing. This puts us in a situation very similar to that studied in Theorem |6.2| 
except now we appear to be dealing with a subgroup of GL 4 (Z) rather than of 
SL 2 (Z). 

It turns out, however, that this situation is essentially a two-dimensional one in 
disguise, and for this we need to add to the list of areas of mathematics we touch 
upon by hinting at Lie theory and special relativity! The matrices Si, S2, S3, S4 
belong to SOj?(Z), the subgroup of GL-^Z) consisting of 4 x 4 matrices with deter- 
minant one which preserve the quadratic form F(x) — 2(x\ + x\ + x\ + £4) — (xi + 
X2 + X3 + X4) 2 (cf. ( |6.1| ). By the standard theory of quadratic forms (over K) this 
is equivalent to the Lorentz form L(y) = y\ + y\ + 2/3 — y\, and so we may identify 
SOi?(]R) with the orthogonal group SO(3,l) preserving this latter form. But it is 
very well-known that this group admits SU(2) as a double cover: this is because 
the set {y : L(y) = —1} may be identified with the set of 2 x 2 hermitian matrices 
M with determinant 1 via 



(2/1, 2/2,2/3,2/4) 



2/4 + 2/3 2/i - W2 
2/1 + W2 2/4 - 2/3 



and so any P € SU(2) gives rise to an element of SO (3, 1) via the transformation 
M i-> PMP*. 

By lifting to this double cover the group (S) can be lifted to a subgroup of 



SLi2(Z[i]). The proof of Theorem 6.2 goes through with relatively minimal changes, 
although once again the group generated by Si , S2 , S3 and S4 contains unipotents 
and so, if the aim is simply to find infinitely many circles or pairs/quadruples of 
touching circles with almost-prime curvatures, more elementary approaches work 
just as well. Those elementary approaches do not, however, give sharp quantitative 
results, whereas the techniques we have sketched do. To explain one such result, 
imagine Figure [T] being generated as follows. Start with the outer circle (which has 
curvature —6) and the three largest inner circles, with curvatures 13, 21 and 24. 
This is the first generation. The second generation consists of those circles touching 
three from the first generation: they have curvatures 28,37,61 and 124. The third 
generation contains those new circles touching three circles from either the first or 
the second generations: these have curvatures 45, 60, 69, 93, 124, 132, 133, 156, 
220, 292, 301 and 325. Carry on in this vein: the nth generation will contain 4-3 n ~ 2 
circles. 

Theorem 6.3 (Bourgain, Gamburd, Sarnak). The number of circles at generation 
n which have prime curvature is bounded by C3 n /n, for some absolute constant C . 



We conclude by remarking that there are some very interesting unsolved ques- 
tions connected with Apollonian packings |24j . In that paper the very interesting 
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question is raised of whether, in Figure [T] a positive proportion of all positive in- 
tegers appear as curvatures. J. Bourgain has recently indicated to me that he and 
Elena Fuchs have obtained new information on this question. See also [39] for an 
asymptotic formula for the number of circles in the packing of curvature at most X. 
It seems that the question of describing this set of integers more precisely remains 
open: are they given, from some point on, by finitely many congruence conditions? 
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